Overview

Dataset info

Number of variables57
Number of observations28235
Missing cells448424 (27.9%)
Duplicate rows0 (0.0%)
Total size in memory9.5 MiB
Average record size in memory352.0 B

Variables types

Numeric8
Categorical5
Boolean0
Date0
URL0
Text (Unique)0
Rejected44
Unsupported0

Warnings

city_levenshtein_simple has 18290 (64.8%) missing values Missing
city_levenshtein_simple_bin is highly correlated with city_levenshtein_simple (ρ = 0.9850689705) Rejected
city_levenshtein_term is highly correlated with city_levenshtein_simple_bin (ρ = 0.9488635718) Rejected
city_levenshtein_term_bin is highly correlated with city_levenshtein_term (ρ = 0.9930704216) Rejected
city_trigram_simple is highly correlated with city_levenshtein_term_bin (ρ = 0.9252824659) Rejected
city_trigram_simple_bin is highly correlated with city_trigram_simple (ρ = 0.9937484473) Rejected
city_trigram_term is highly correlated with city_trigram_simple_bin (ρ = 0.9305819585) Rejected
city_trigram_term_bin is highly correlated with city_trigram_term (ρ = 0.9967797356) Rejected
fax_levenshtein has 27505 (97.4%) missing values Missing
fax_levenshtein_bin is highly correlated with fax_levenshtein (ρ = 0.988964125) Rejected
fax_trigram is highly correlated with fax_levenshtein_bin (ρ = 0.9516868396) Rejected
fax_trigram_bin is highly correlated with fax_trigram (ρ = 0.9925748954) Rejected
id has a high cardinality: 27821 distinct values Warning
name_levenshtein_simple_bin is highly correlated with name_levenshtein_simple (ρ = 0.9735343526) Rejected
name_levenshtein_term_bin is highly correlated with name_levenshtein_term (ρ = 0.9805968951) Rejected
name_trigram_simple is highly correlated with name_levenshtein_simple_bin (ρ = 0.9636431124) Rejected
name_trigram_simple_bin is highly correlated with name_trigram_simple (ρ = 0.9845669582) Rejected
name_trigram_term is highly correlated with name_trigram_simple_bin (ρ = 0.9446397504) Rejected
name_trigram_term_bin is highly correlated with name_trigram_term (ρ = 0.9865189142) Rejected
phone_levenshtein has 16371 (58.0%) missing values Missing
phone_levenshtein_bin is highly correlated with phone_levenshtein (ρ = 0.9901385307) Rejected
phone_trigram is highly correlated with phone_levenshtein_bin (ρ = 0.9515029939) Rejected
phone_trigram_bin is highly correlated with phone_trigram (ρ = 0.9915532963) Rejected
street_levenshtein_simple has 19997 (70.8%) missing values Missing
street_levenshtein_simple_bin is highly correlated with street_levenshtein_simple (ρ = 0.977604212) Rejected
street_levenshtein_term is highly correlated with street_levenshtein_simple_bin (ρ = 0.9253512513) Rejected
street_levenshtein_term_bin is highly correlated with street_levenshtein_term (ρ = 0.9816441933) Rejected
street_number_levenshtein is highly correlated with street_number_equality (ρ = 0.9128880932) Rejected
street_number_levenshtein_bin is highly correlated with street_number_levenshtein (ρ = 0.9916964742) Rejected
street_number_trigram is highly correlated with street_number_levenshtein_bin (ρ = 0.941348484) Rejected
street_number_trigram_bin is highly correlated with street_number_trigram (ρ = 0.996232939) Rejected
street_trigram_simple is highly correlated with street_levenshtein_term_bin (ρ = 0.931397809) Rejected
street_trigram_simple_bin is highly correlated with street_trigram_simple (ρ = 0.9874532905) Rejected
street_trigram_term is highly correlated with street_trigram_simple_bin (ρ = 0.9655146219) Rejected
street_trigram_term_bin is highly correlated with street_trigram_term (ρ = 0.9886679289) Rejected
website_levenshtein_simple has 26416 (93.6%) missing values Missing
website_levenshtein_simple_bin is highly correlated with website_levenshtein_simple (ρ = 0.9724771988) Rejected
website_levenshtein_term is highly correlated with website_levenshtein_simple_bin (ρ = 0.9296893974) Rejected
website_levenshtein_term_bin is highly correlated with website_levenshtein_term (ρ = 0.9853659607) Rejected
website_trigram_simple is highly correlated with website_levenshtein_term_bin (ρ = 0.9424270326) Rejected
website_trigram_simple_bin is highly correlated with website_trigram_simple (ρ = 0.9746094625) Rejected
website_trigram_term is highly correlated with website_trigram_simple_bin (ρ = 0.9245284537) Rejected
website_trigram_term_bin is highly correlated with website_trigram_term (ρ = 0.9853001957) Rejected
zip_levenshtein_simple has 299 (1.1%) zeros Zeros
zip_levenshtein_simple has 20539 (72.7%) missing values Missing
zip_levenshtein_simple_bin is highly correlated with zip_levenshtein_simple (ρ = 0.9674268288) Rejected
zip_levenshtein_term is highly correlated with zip_levenshtein_simple_bin (ρ = 0.9590540554) Rejected
zip_levenshtein_term_bin is highly correlated with zip_levenshtein_term (ρ = 0.9684604865) Rejected
zip_trigram_simple is highly correlated with zip_levenshtein_term (ρ = 0.9081318961) Rejected
zip_trigram_simple_bin is highly correlated with zip_trigram_simple (ρ = 0.9967649793) Rejected
zip_trigram_term is highly correlated with zip_trigram_simple_bin (ρ = 0.9924286597) Rejected
zip_trigram_term_bin is highly correlated with zip_trigram_term (ρ = 0.9967692543) Rejected

Variables

city_levenshtein_simple
Numeric

Distinct count218
Unique (%)0.8%
Missing (%)64.8%
Missing (n)18290
Infinite (%)0.0%
Infinite (n)0
Mean0.8824213147
Minimum0
Maximum1
Zeros (%)0.5%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.1793719977
Q11
Median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range0

Descriptive statistics

Standard deviation0.2514716387
Coef of variation0.284979105
Kurtosis3.665280581
Mean0.8824213147
MAD0.1819805503
Skewness-2.17834568
Sum8775.679688
Variance0.06323798746
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1 7685 27.2%
 
0.6666669846 770 2.7%
 
0.5714290142 211 0.7%
 
0.8333330154 134 0.5%
 
0 133 0.5%
 
0.1000000015 54 0.2%
 
0.5 44 0.2%
 
0.1666669995 44 0.2%
 
0.125 37 0.1%
 
0.25 36 0.1%
 
Other values (207) 797 2.8%
 
(Missing) 18290 64.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 133 0.5%
 
0.05000000075 1 < 0.1%
 
0.05555560067 3 < 0.1%
 
0.05714289844 1 < 0.1%
 
0.06060609967 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 7685 27.2%
 
0.962962985 2 < 0.1%
 
0.9583330154 4 < 0.1%
 
0.9523810148 2 < 0.1%
 
0.9444440007 3 < 0.1%
 

city_levenshtein_simple_bin
Highly correlated

This variable is highly correlated with city_levenshtein_simple and should be ignored for analysis

Correlation0.9850689705

city_levenshtein_term
Highly correlated

This variable is highly correlated with city_levenshtein_simple_bin and should be ignored for analysis

Correlation0.9488635718

city_levenshtein_term_bin
Highly correlated

This variable is highly correlated with city_levenshtein_term and should be ignored for analysis

Correlation0.9930704216

city_trigram_simple
Highly correlated

This variable is highly correlated with city_levenshtein_term_bin and should be ignored for analysis

Correlation0.9252824659

city_trigram_simple_bin
Highly correlated

This variable is highly correlated with city_trigram_simple and should be ignored for analysis

Correlation0.9937484473

city_trigram_term
Highly correlated

This variable is highly correlated with city_trigram_simple_bin and should be ignored for analysis

Correlation0.9305819585

city_trigram_term_bin
Highly correlated

This variable is highly correlated with city_trigram_term and should be ignored for analysis

Correlation0.9967797356

fax_equality
Categorical

Distinct count3
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
27505
2
 
386
1
 
344
ValueCountFrequency (%) 
0 27505 97.4%
 
2 386 1.4%
 
1 344 1.2%
 
Max length1
Mean length1
Min length1
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

fax_levenshtein
Numeric

Distinct count19
Unique (%)0.1%
Missing (%)97.4%
Missing (n)27505
Infinite (%)0.0%
Infinite (n)0
Mean0.7646326423
Minimum0.1000000015
Maximum1
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum0.1000000015
5-th percentile0.3000000119
Q10.5
Median1
Q31
95-th percentile1
Maximum1
Range0.8999999762
Interquartile range0.5

Descriptive statistics

Standard deviation0.2792602479
Coef of variation0.3652214706
Kurtosis-1.050391316
Mean0.7646326423
MAD0.2558975518
Skewness-0.6480944753
Sum558.1818237
Variance0.07798628509
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%) 
1 386 1.4%
 
0.5 44 0.2%
 
0.400000006 41 0.1%
 
0.6000000238 38 0.1%
 
0.5454545617 37 0.1%
 
0.4545454681 30 0.1%
 
0.3000000119 29 0.1%
 
0.6999999881 22 0.1%
 
0.7272727489 21 0.1%
 
0.200000003 17 0.1%
 
Other values (8) 65 0.2%
 
(Missing) 27505 97.4%
 

Minimum 5 values

ValueCountFrequency (%) 
0.1000000015 12 < 0.1%
 
0.200000003 17 0.1%
 
0.2727272809 5 < 0.1%
 
0.3000000119 29 0.1%
 
0.3636363745 15 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 386 1.4%
 
0.9090909362 13 < 0.1%
 
0.8999999762 3 < 0.1%
 
0.8181818128 3 < 0.1%
 
0.8000000119 3 < 0.1%
 

fax_levenshtein_bin
Highly correlated

This variable is highly correlated with fax_levenshtein and should be ignored for analysis

Correlation0.988964125

fax_trigram
Highly correlated

This variable is highly correlated with fax_levenshtein_bin and should be ignored for analysis

Correlation0.9516868396

fax_trigram_bin
Highly correlated

This variable is highly correlated with fax_trigram and should be ignored for analysis

Correlation0.9925748954

id
Categorical

Distinct count27821
Unique (%)98.5%
Missing (%)0.0%
Missing (n)0
11287#11288
 
2
12999#13000
 
2
12367#12368
 
2
Other values (27818)
28229
ValueCountFrequency (%) 
11287#11288 2 < 0.1%
 
12999#13000 2 < 0.1%
 
12367#12368 2 < 0.1%
 
12425#12426 2 < 0.1%
 
12569#12570 2 < 0.1%
 
12018#12019 2 < 0.1%
 
12575#12576 2 < 0.1%
 
13021#13022 2 < 0.1%
 
12710#12711 2 < 0.1%
 
12450#12451 2 < 0.1%
 
Other values (27811) 28215 99.9%
 
Max length13
Mean length9.077173721
Min length3
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsTrue

is_match
Categorical

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
1
20262
-1
7973
ValueCountFrequency (%) 
1 20262 71.8%
 
-1 7973 28.2%
 
Max length2
Mean length1.282380025
Min length1
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsTrue

name_levenshtein_simple
Numeric

Distinct count3511
Unique (%)12.4%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean0.6281707883
Minimum0
Maximum1
Zeros (%)0.8%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.1357139945
Q10.3633864969
Median0.6666669846
Q31
95-th percentile1
Maximum1
Range1
Interquartile range0.6366135031

Descriptive statistics

Standard deviation0.3047668338
Coef of variation0.4851655662
Kurtosis-1.23851192
Mean0.6281707883
MAD0.2647316456
Skewness-0.2540036738
Sum17736.40234
Variance0.09288282692
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0. 0.012987 0.04494945 0.05462965 0.05587125 ... 0.95346302 0.97321451 0.97613651 0.98863649 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 7152 25.3%
 
0.6666669846 2281 8.1%
 
0.8000000119 1598 5.7%
 
0.8571429849 1028 3.6%
 
0.5 957 3.4%
 
0.5714290142 600 2.1%
 
0.75 533 1.9%
 
0.400000006 472 1.7%
 
0.8888890147 438 1.6%
 
0.3333329856 296 1.0%
 
Other values (3501) 12880 45.6%
 

Minimum 5 values

ValueCountFrequency (%) 
0 212 0.8%
 
0.02597399987 1 < 0.1%
 
0.02857140079 1 < 0.1%
 
0.03333330154 3 < 0.1%
 
0.03571429849 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 7152 25.3%
 
0.9772729874 7 < 0.1%
 
0.9750000238 16 0.1%
 
0.9714289904 2 < 0.1%
 
0.9666669965 2 < 0.1%
 

name_levenshtein_simple_bin
Highly correlated

This variable is highly correlated with name_levenshtein_simple and should be ignored for analysis

Correlation0.9735343526

name_levenshtein_term
Numeric

Distinct count727
Unique (%)2.6%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean0.5370063186
Minimum0
Maximum1
Zeros (%)0.9%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.1111110002
Q10.243242994
Median0.4761900008
Q30.875
95-th percentile1
Maximum1
Range1
Interquartile range0.631757006

Descriptive statistics

Standard deviation0.3248197734
Coef of variation0.6048713923
Kurtosis-1.379292369
Mean0.5370063186
MAD0.287050277
Skewness0.2472924888
Sum15162.37402
Variance0.1055078804
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0. 0.0121951 0.0251821 0.03871125 0.05195685 ... 0.94281 0.94590601 0.96362451 0.98936149 1. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 6569 23.3%
 
0.25 528 1.9%
 
0.5 487 1.7%
 
0.200000003 464 1.6%
 
0.3333329856 444 1.6%
 
0.6666669846 330 1.2%
 
0.5714290142 311 1.1%
 
0.1428570002 284 1.0%
 
0.2142859995 278 1.0%
 
0.1666669995 278 1.0%
 
Other values (717) 18262 64.7%
 

Minimum 5 values

ValueCountFrequency (%) 
0 255 0.9%
 
0.02439020015 1 < 0.1%
 
0.02597399987 1 < 0.1%
 
0.02857140079 1 < 0.1%
 
0.02941180021 6 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 6569 23.3%
 
0.9787229896 2 < 0.1%
 
0.9750000238 9 < 0.1%
 
0.9705880284 1 < 0.1%
 
0.9666669965 1 < 0.1%
 

name_levenshtein_term_bin
Highly correlated

This variable is highly correlated with name_levenshtein_term and should be ignored for analysis

Correlation0.9805968951

name_trigram_simple
Highly correlated

This variable is highly correlated with name_levenshtein_simple_bin and should be ignored for analysis

Correlation0.9636431124

name_trigram_simple_bin
Highly correlated

This variable is highly correlated with name_trigram_simple and should be ignored for analysis

Correlation0.9845669582

name_trigram_term
Highly correlated

This variable is highly correlated with name_trigram_simple_bin and should be ignored for analysis

Correlation0.9446397504

name_trigram_term_bin
Highly correlated

This variable is highly correlated with name_trigram_term and should be ignored for analysis

Correlation0.9865189142

phone_equality
Categorical

Distinct count3
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
0
16369
2
7863
1
4003
ValueCountFrequency (%) 
0 16369 58.0%
 
2 7863 27.8%
 
1 4003 14.2%
 
Max length1
Mean length1
Min length1
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

phone_levenshtein
Numeric

Distinct count29
Unique (%)0.1%
Missing (%)58.0%
Missing (n)16371
Infinite (%)0.0%
Infinite (n)0
Mean0.8439607024
Minimum0.1000000015
Maximum1
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum0.1000000015
5-th percentile0.2727272809
Q10.6999999881
Median1
Q31
95-th percentile1
Maximum1
Range0.8999999762
Interquartile range0.3000000119

Descriptive statistics

Standard deviation0.2519796193
Coef of variation0.2985679507
Kurtosis0.4257274568
Mean0.8439607024
MAD0.2102880329
Skewness-1.351725817
Sum10012.75
Variance0.06349372119
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%) 
1 7863 27.8%
 
0.5454545617 409 1.4%
 
0.4545454681 408 1.4%
 
0.8181818128 348 1.2%
 
0.6363636255 332 1.2%
 
0.3636363745 273 1.0%
 
0.9090909362 251 0.9%
 
0.5 249 0.9%
 
0.6000000238 230 0.8%
 
0.7272727489 196 0.7%
 
Other values (18) 1305 4.6%
 
(Missing) 16371 58.0%
 

Minimum 5 values

ValueCountFrequency (%) 
0.1000000015 89 0.3%
 
0.1666666716 9 < 0.1%
 
0.1818181872 146 0.5%
 
0.200000003 154 0.5%
 
0.25 17 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 7863 27.8%
 
0.9166666865 20 0.1%
 
0.9090909362 251 0.9%
 
0.8999999762 48 0.2%
 
0.8333333135 24 0.1%
 

phone_levenshtein_bin
Highly correlated

This variable is highly correlated with phone_levenshtein and should be ignored for analysis

Correlation0.9901385307

phone_trigram
Highly correlated

This variable is highly correlated with phone_levenshtein_bin and should be ignored for analysis

Correlation0.9515029939

phone_trigram_bin
Highly correlated

This variable is highly correlated with phone_trigram and should be ignored for analysis

Correlation0.9915532963

street_levenshtein_simple
Numeric

Distinct count1717
Unique (%)6.1%
Missing (%)70.8%
Missing (n)19997
Infinite (%)0.0%
Infinite (n)0
Mean0.6922866106
Minimum0
Maximum1
Zeros (%)0.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.1666669995
Q10.3857277632
Median0.8000000119
Q31
95-th percentile1
Maximum1
Range1
Interquartile range0.6142722368

Descriptive statistics

Standard deviation0.3100984395
Coef of variation0.4479336143
Kurtosis-1.263106465
Mean0.6922866106
MAD0.2781056166
Skewness-0.5044587851
Sum5703.057129
Variance0.09616104513
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1 2845 10.1%
 
0.8571429849 358 1.3%
 
0.6666669846 220 0.8%
 
0.75 210 0.7%
 
0.8000000119 203 0.7%
 
0.8888890147 145 0.5%
 
0.8333330154 79 0.3%
 
0.5 73 0.3%
 
0.3333329856 62 0.2%
 
0.875 53 0.2%
 
Other values (1706) 3990 14.1%
 
(Missing) 19997 70.8%
 

Minimum 5 values

ValueCountFrequency (%) 
0 22 0.1%
 
0.02777780034 1 < 0.1%
 
0.02857140079 1 < 0.1%
 
0.03333330154 2 < 0.1%
 
0.03846149892 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 2845 10.1%
 
0.9841269851 10 < 0.1%
 
0.9818180203 4 < 0.1%
 
0.9814810157 1 < 0.1%
 
0.9807689786 1 < 0.1%
 

street_levenshtein_simple_bin
Highly correlated

This variable is highly correlated with street_levenshtein_simple and should be ignored for analysis

Correlation0.977604212

street_levenshtein_term
Highly correlated

This variable is highly correlated with street_levenshtein_simple_bin and should be ignored for analysis

Correlation0.9253512513

street_levenshtein_term_bin
Highly correlated

This variable is highly correlated with street_levenshtein_term and should be ignored for analysis

Correlation0.9816441933

street_number_equality
Categorical

Distinct count3
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
2
14064
1
13409
0
 
762
ValueCountFrequency (%) 
2 14064 49.8%
 
1 13409 47.5%
 
0 762 2.7%
 
Max length1
Mean length1
Min length1
Contains charsFalse
Contains digitsTrue
Contains spacesFalse
Contains non-wordsFalse

street_number_levenshtein
Highly correlated

This variable is highly correlated with street_number_equality and should be ignored for analysis

Correlation0.9128880932

street_number_levenshtein_bin
Highly correlated

This variable is highly correlated with street_number_levenshtein and should be ignored for analysis

Correlation0.9916964742

street_number_trigram
Highly correlated

This variable is highly correlated with street_number_levenshtein_bin and should be ignored for analysis

Correlation0.941348484

street_number_trigram_bin
Highly correlated

This variable is highly correlated with street_number_trigram and should be ignored for analysis

Correlation0.996232939

street_trigram_simple
Highly correlated

This variable is highly correlated with street_levenshtein_term_bin and should be ignored for analysis

Correlation0.931397809

street_trigram_simple_bin
Highly correlated

This variable is highly correlated with street_trigram_simple and should be ignored for analysis

Correlation0.9874532905

street_trigram_term
Highly correlated

This variable is highly correlated with street_trigram_simple_bin and should be ignored for analysis

Correlation0.9655146219

street_trigram_term_bin
Highly correlated

This variable is highly correlated with street_trigram_term and should be ignored for analysis

Correlation0.9886679289

website_levenshtein_simple
Numeric

Distinct count313
Unique (%)1.1%
Missing (%)93.6%
Missing (n)26416
Infinite (%)0.0%
Infinite (n)0
Mean0.8158383965
Minimum0.174999997
Maximum1
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum0.174999997
5-th percentile0.4113099873
Q10.6153849959
Median0.9736840129
Q31
95-th percentile1
Maximum1
Range0.8249999881
Interquartile range0.3846150041

Descriptive statistics

Standard deviation0.2207053304
Coef of variation0.2705257833
Kurtosis-0.8028762341
Mean0.8158383965
MAD0.1942105144
Skewness-0.7789297104
Sum1484.01001
Variance0.04871084541
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
1 906 3.2%
 
0.8571429849 38 0.1%
 
0.8888890147 31 0.1%
 
0.8000000119 30 0.1%
 
0.75 29 0.1%
 
0.5 22 0.1%
 
0.7142860293 21 0.1%
 
0.8412700295 20 0.1%
 
0.7272729874 19 0.1%
 
0.5066670179 18 0.1%
 
Other values (302) 685 2.4%
 
(Missing) 26416 93.6%
 

Minimum 5 values

ValueCountFrequency (%) 
0.174999997 2 < 0.1%
 
0.2166340053 1 < 0.1%
 
0.2407049984 1 < 0.1%
 
0.2592230141 2 < 0.1%
 
0.2638890147 7 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 906 3.2%
 
0.9791669846 1 < 0.1%
 
0.9772729874 1 < 0.1%
 
0.9736840129 2 < 0.1%
 
0.9666669965 6 < 0.1%
 

website_levenshtein_simple_bin
Highly correlated

This variable is highly correlated with website_levenshtein_simple and should be ignored for analysis

Correlation0.9724771988

website_levenshtein_term
Highly correlated

This variable is highly correlated with website_levenshtein_simple_bin and should be ignored for analysis

Correlation0.9296893974

website_levenshtein_term_bin
Highly correlated

This variable is highly correlated with website_levenshtein_term and should be ignored for analysis

Correlation0.9853659607

website_trigram_simple
Highly correlated

This variable is highly correlated with website_levenshtein_term_bin and should be ignored for analysis

Correlation0.9424270326

website_trigram_simple_bin
Highly correlated

This variable is highly correlated with website_trigram_simple and should be ignored for analysis

Correlation0.9746094625

website_trigram_term
Highly correlated

This variable is highly correlated with website_trigram_simple_bin and should be ignored for analysis

Correlation0.9245284537

website_trigram_term_bin
Highly correlated

This variable is highly correlated with website_trigram_term and should be ignored for analysis

Correlation0.9853001957

zip_levenshtein_simple
Numeric

Distinct count20
Unique (%)0.1%
Missing (%)72.7%
Missing (n)20539
Infinite (%)0.0%
Infinite (n)0
Mean0.8938370347
Minimum0
Maximum1
Zeros (%)1.1%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0.200000003
Q11
Median1
Q31
95-th percentile1
Maximum1
Range1
Interquartile range0

Descriptive statistics

Standard deviation0.2479177564
Coef of variation0.2773634791
Kurtosis5.726059437
Mean0.8938370347
MAD0.1646519005
Skewness-2.591394424
Sum6878.969727
Variance0.06146321446
Memory size110.4 KiB
Histogram
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%) 
1 5968 21.1%
 
0.8000000119 737 2.6%
 
0 299 1.1%
 
0.200000003 144 0.5%
 
0.6666669846 138 0.5%
 
0.25 110 0.4%
 
0.6000000238 109 0.4%
 
0.5 52 0.2%
 
0.8333330154 47 0.2%
 
0.400000006 42 0.1%
 
Other values (9) 50 0.2%
 
(Missing) 20539 72.7%
 

Minimum 5 values

ValueCountFrequency (%) 
0 299 1.1%
 
0.08333329856 4 < 0.1%
 
0.1666669995 5 < 0.1%
 
0.200000003 144 0.5%
 
0.2222220004 4 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1 5968 21.1%
 
0.8333330154 47 0.2%
 
0.8000000119 737 2.6%
 
0.75 24 0.1%
 
0.6666669846 138 0.5%
 

zip_levenshtein_simple_bin
Highly correlated

This variable is highly correlated with zip_levenshtein_simple and should be ignored for analysis

Correlation0.9674268288

zip_levenshtein_term
Highly correlated

This variable is highly correlated with zip_levenshtein_simple_bin and should be ignored for analysis

Correlation0.9590540554

zip_levenshtein_term_bin
Highly correlated

This variable is highly correlated with zip_levenshtein_term and should be ignored for analysis

Correlation0.9684604865

zip_trigram_simple
Highly correlated

This variable is highly correlated with zip_levenshtein_term and should be ignored for analysis

Correlation0.9081318961

zip_trigram_simple_bin
Highly correlated

This variable is highly correlated with zip_trigram_simple and should be ignored for analysis

Correlation0.9967649793

zip_trigram_term
Highly correlated

This variable is highly correlated with zip_trigram_simple_bin and should be ignored for analysis

Correlation0.9924286597

zip_trigram_term_bin
Highly correlated

This variable is highly correlated with zip_trigram_term and should be ignored for analysis

Correlation0.9967692543

Correlations

Missing values

Sample

First rows

city_levenshtein_simplecity_levenshtein_simple_bincity_levenshtein_termcity_levenshtein_term_bincity_trigram_simplecity_trigram_simple_bincity_trigram_termcity_trigram_term_binfax_equalityfax_levenshteinfax_levenshtein_binfax_trigramfax_trigram_binidis_matchname_levenshtein_simplename_levenshtein_simple_binname_levenshtein_termname_levenshtein_term_binname_trigram_simplename_trigram_simple_binname_trigram_termname_trigram_term_binphone_equalityphone_levenshteinphone_levenshtein_binphone_trigramphone_trigram_binstreet_levenshtein_simplestreet_levenshtein_simple_binstreet_levenshtein_termstreet_levenshtein_term_binstreet_number_equalitystreet_number_levenshteinstreet_number_levenshtein_binstreet_number_trigramstreet_number_trigram_binstreet_trigram_simplestreet_trigram_simple_binstreet_trigram_termstreet_trigram_term_binwebsite_levenshtein_simplewebsite_levenshtein_simple_binwebsite_levenshtein_termwebsite_levenshtein_term_binwebsite_trigram_simplewebsite_trigram_simple_binwebsite_trigram_termwebsite_trigram_term_binzip_levenshtein_simplezip_levenshtein_simple_binzip_levenshtein_termzip_levenshtein_term_binzip_trigram_simplezip_trigram_simple_binzip_trigram_termzip_trigram_term_bin
0NaN-1NaN-1NaN-1NaN-10NaN-1NaN-11204#120710.66666730.40000020.66666730.52631620NaN-1NaN-1NaN-1NaN-121.041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
1NaN-1NaN-1NaN-1NaN-10NaN-1NaN-11272#127910.66666730.41176520.66666730.44444420NaN-1NaN-1NaN-1NaN-121.041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
21.041.041.041.040NaN-1NaN-16258#625911.00000041.00000041.00000041.000000421.041.041.00000041.000000421.041.00000041.00000041.004NaN-1NaN-1NaN-1NaN-11.041.041.00000041.0000004
3NaN-1NaN-1NaN-1NaN-10NaN-1NaN-116076#16077-10.56547620.26087010.33333310.189189021.041.04NaN-1NaN-110.000.0000000NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
41.041.041.041.040NaN-1NaN-12666#267110.66666730.50000020.66666730.518519221.041.041.00000041.000000421.041.00000041.00000041.004NaN-1NaN-1NaN-1NaN-11.041.041.00000041.0000004
51.041.041.041.040NaN-1NaN-14402#440311.00000041.00000041.00000041.000000421.041.041.00000041.000000421.041.00000041.00000041.0041.041.041.041.041.041.041.00000041.0000004
61.041.041.041.040NaN-1NaN-14025#4028-10.12103200.18518500.00000000.00000000NaN-1NaN-10.38095210.428571210.520.25000010.33333310.160NaN-1NaN-1NaN-1NaN-10.840.840.33333310.3333331
7NaN-1NaN-1NaN-1NaN-10NaN-1NaN-14126#4138-10.35555610.33333310.25000010.15909100NaN-1NaN-1NaN-1NaN-110.630.2222221NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
81.041.041.041.040NaN-1NaN-14559#4560-10.16729800.17391300.00000000.00000000NaN-1NaN-1NaN-1NaN-11NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
91.041.041.041.040NaN-1NaN-115610#15611-10.20833310.26666710.00000000.00000000NaN-1NaN-1NaN-1NaN-11NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1

Last rows

city_levenshtein_simplecity_levenshtein_simple_bincity_levenshtein_termcity_levenshtein_term_bincity_trigram_simplecity_trigram_simple_bincity_trigram_termcity_trigram_term_binfax_equalityfax_levenshteinfax_levenshtein_binfax_trigramfax_trigram_binidis_matchname_levenshtein_simplename_levenshtein_simple_binname_levenshtein_termname_levenshtein_term_binname_trigram_simplename_trigram_simple_binname_trigram_termname_trigram_term_binphone_equalityphone_levenshteinphone_levenshtein_binphone_trigramphone_trigram_binstreet_levenshtein_simplestreet_levenshtein_simple_binstreet_levenshtein_termstreet_levenshtein_term_binstreet_number_equalitystreet_number_levenshteinstreet_number_levenshtein_binstreet_number_trigramstreet_number_trigram_binstreet_trigram_simplestreet_trigram_simple_binstreet_trigram_termstreet_trigram_term_binwebsite_levenshtein_simplewebsite_levenshtein_simple_binwebsite_levenshtein_termwebsite_levenshtein_term_binwebsite_trigram_simplewebsite_trigram_simple_binwebsite_trigram_termwebsite_trigram_term_binzip_levenshtein_simplezip_levenshtein_simple_binzip_levenshtein_termzip_levenshtein_term_binzip_trigram_simplezip_trigram_simple_binzip_trigram_termzip_trigram_term_bin
28225NaN-1NaN-1NaN-1NaN-10NaN-1NaN-12157#216810.85714340.56521720.85714340.63636430NaN-1NaN-1NaN-1NaN-121.00000041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
28226NaN-1NaN-1NaN-1NaN-10NaN-1NaN-12875#288210.40000020.13333300.40000020.238095121.041.04NaN-1NaN-121.00000041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
28227NaN-1NaN-1NaN-1NaN-10NaN-1NaN-12640#264410.28571410.09677400.28571410.12903200NaN-1NaN-1NaN-1NaN-121.00000041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
28228NaN-1NaN-1NaN-1NaN-10NaN-1NaN-11400#139610.66666730.45454520.66666730.47826120NaN-1NaN-1NaN-1NaN-11NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
28229NaN-1NaN-1NaN-1NaN-10NaN-1NaN-12018#202410.88888940.80645240.88888940.80645240NaN-1NaN-1NaN-1NaN-121.00000041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
282301.041.041.041.040NaN-1NaN-17148#714911.00000041.00000041.00000041.000000421.041.04NaN-1NaN-121.00000041.0000004NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
28231NaN-1NaN-1NaN-1NaN-10NaN-1NaN-18851#886010.72727330.62069030.72727330.62857130NaN-1NaN-1NaN-1NaN-11NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1
282321.041.041.041.040NaN-1NaN-16468#6470-10.26388910.38461510.06666700.040000021.041.041.00000041.00421.00000041.00000041.0041.00004NaN-1NaN-1NaN-1NaN-10.000.000.000.00
282331.041.041.041.040NaN-1NaN-13516#3527-10.20857110.18750000.04000000.04166700NaN-1NaN-10.49763320.45210.66666730.33333310.3510.18750NaN-1NaN-1NaN-1NaN-11.041.041.041.04
28234NaN-1NaN-1NaN-1NaN-10NaN-1NaN-112899#12477-10.52777820.38095210.34782610.19444400NaN-1NaN-1NaN-1NaN-11NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1NaN-1